NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Confronting Reward Model Overoptimization with Constrained RLHF

Moskovitz, T; Singh, A; Strouse, DJ; Sandholm, T; Salakhutdinov, R; Dragan, A; McAleer, S (May 2024, ICLR)

Full Text Available
Confronting Reward Model Overoptimization with Constrained RLHF

Moskovitz, T; Singh, A; Strouse, DJ; Sandholm, T; Salakhutdinov, R; Dragan, A; McAleer, S (May 2024, ICLR24)

Large language models are typically aligned with human preferences by optimizing reward models (RMs) fitted to human feedback. However, human preferences are multi-faceted, and it is increasingly common to derive reward from a composition of simpler reward models which each capture a different aspect of language quality. This itself presents a challenge, as it is difficult to appropriately weight these component RMs when combining them. Compounding this difficulty, because any RM is only a proxy for human evaluation, this process is vulnerable to overoptimization, wherein past a certain point, accumulating higher reward is associated with worse human ratings. In this paper, we perform, to our knowledge, the first study on overoptimization in composite RMs, showing that correlation between component RMs has a significant effect on the locations of these points. We then introduce an approach to solve this issue using constrained reinforcement learning as a means of preventing the agent from exceeding each RM’s threshold of usefulness. Our method addresses the problem of weighting component RMs by learning dynamic weights, naturally expressed by Lagrange multipliers. As a result, each RM stays within the range at which it is an effective proxy, improving evaluation performance. Finally, we introduce an adaptive method using gradient-free optimization to identify and optimize towards these points during a single run.
more » « less
Full Text Available
Learning to hallucinate examples from extrinsic and intrinsic supervision.

https://doi.org/10.1109/ICCV48922.2021.00858

Gui, L.; Bardes, A.; Salakhutdinov, R.; Hauptmann, A.; Hebert, M.; Wang, Y.-X. (January 2022, International Conference on Computer Vision)

Full Text Available
Information Obfuscation of Graph Neural Networks

Liao, P; Zhao, H.; Xu, K; Jaakkola, T; Gordon, G; Jegelka, S; Salakhutdinov, R (July 2021, International Conference on Machine Learning (ICML))

While the advent of Graph Neural Networks (GNNs) has greatly improved node and graph representation learning in many applications, the neighborhood aggregation scheme exposes additional vulnerabilities to adversaries seeking to extract node-level information about sensitive attributes. In this paper, we study the problem of protecting sensitive attributes by information obfuscation when learning with graph structured data. We propose a framework to locally filter out pre-determined sensitive attributes via adversarial training with the total variation and the Wasserstein distance. Our method creates a strong defense against inference attacks, while only suffering small loss in task performance. Theoretically, we analyze the effectiveness of our framework against a worst-case adversary, and characterize an inherent trade-off between maximizing predictive accuracy and minimizing information leakage. Experiments across multiple datasets from recommender systems, knowledge graphs and quantum chemistry demonstrate that the proposed approach provides a robust defense across various graph structures and tasks, while producing competitive GNN encoders for downstream tasks.
more » « less
Full Text Available
Self-supervised Learning from a Multi-view Perspective

Tsai, Y.-H.; Wu, Y.; Salakhutdinov, R.; Morency, L.-P. (January 2021, Proceedings of the International Conference on Learning Representations (ICLR), 2021)
null (Ed.)
Full Text Available
Learning Language and Multimodal Privacy-Preserving Markers of Mood from Mobile Data

https://doi.org/10.18653/v1/2021.acl-long.322

Liang, P.P.; Liu, T.; Cai, A.; Muszynski, M.; Ishii, R.; Allen, N.; Auerbach, R.; Brent, D.; Salakhutdinov, R.; Morency, L-P. (January 2021, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing)

Full Text Available
Multimodal Routing: Improving Local and Global Interpretability of Multimodal Language Analysis

Tsai, Y.-H.; Ma, M.; Yang, M.; Salakhutdinov, R.; Morency, L.-P. (January 2020, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP))

Full Text Available
Neural Methods for Point-wise Dependency Estimation

Tsai, Y.-H.; Zhao, H.; Yamada, M.; Morency, L.-P.; Salakhutdinov, R. (January 2020, Proceedings of the Neural Information Processing Systems Conference (Neurips))
null (Ed.)
Full Text Available
Multimodal Routing: Improving Local and Global Interpretability of Multimodal Language Analysis.

Tsai, Y.-H.; Ma, M.; Yang, M.; Salakhutdinov, R.; Morency, L.-P. (January 2020, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020)
null (Ed.)
Full Text Available
MultiBench: Multiscale Benchmarks for Multimodal Representation Learning

Liang, P.; Lyu, Y.; Fan, X.; Wu, Z.; Cheng, Y; Wu, J.; Chen, L.Y.; Wu, P.; Lee, M.A.; Zhu, Y.; et al (January 2021, In Proceedings of the Neural Information Processing Systems Conference (Neurips))

Full Text Available

« Prev Next »

Search for: All records